Failure metrics for input diagnostics #20933

patrickmann · 2024-11-11T17:25:10Z

/nocl new feature
Relates to issue #20683

Adds meters for every input that experiences processing and/or indexing failures. The meter names are
org.graylog2.<input Id>.failures.indexing
org.graylog2.<input Id>.failures.processing

These metrics are available in Open and Enterprise, independent of configuring the Processing and Indexing Failure stream.

AntonEbel

I think this check could be a problem. I don't know if we need it here but if we do, we need to do the metric stuff before. At least for processing.

patrickmann · 2024-11-13T13:15:55Z

Moved the metric code up into FailureSubmissionService. It now runs ahead of any checks.
Verified that it still updates the metrics with config set to disabled; and when there is no license.

AntonEbel

I think we should add the metrics after the message.supportsFailureHandling()
method. I've looked at the implementations of this interface method and it looks like messages from the inputs here always return true. Even if we extend the diagnostics page one day to show detailed information from the indexing/processing failure index, hopefully we won't have a discrepancy in the amount of messages then.

graylog2-server/src/main/java/org/graylog/failure/FailureSubmissionService.java

patrickmann · 2024-11-13T14:52:13Z

Even if we extend the diagnostics page one day to show detailed information from the indexing/processing failure index, hopefully we won't have a discrepancy in the amount of messages then.

I was thinking it preferable to show a failure count, even if the messages don't support failure handling. At least you have an indication that something is wrong. I get your point of avoiding a discrepancy in counts, but I'd still prefer to see the "true" failure count. If this becomes an issue, we can hopefully label the data appropriately on the UI so it is not confusing to the user.

graylog2-server/src/main/java/org/graylog/failure/FailureSubmissionService.java

AntonEbel

LGTM

patrickmann marked this pull request as ready for review November 12, 2024 08:13

patrickmann requested a review from AntonEbel November 12, 2024 08:13

AntonEbel requested changes Nov 12, 2024

View reviewed changes

patrickmann added 3 commits November 13, 2024 12:43

failure metrics

126291e

adjust unit test

db26a23

move metrics code ahead of config check

f12ea58

patrickmann force-pushed the failureMetrics branch from 8966f4a to f12ea58 Compare November 13, 2024 13:11

patrickmann requested a review from AntonEbel November 13, 2024 13:16

AntonEbel requested changes Nov 13, 2024

View reviewed changes

graylog2-server/src/main/java/org/graylog/failure/FailureSubmissionService.java Show resolved Hide resolved

graylog2-server/src/main/java/org/graylog/failure/FailureSubmissionService.java Show resolved Hide resolved

refactor

b226780

patrickmann requested a review from AntonEbel November 14, 2024 11:39

AntonEbel requested changes Nov 14, 2024

View reviewed changes

graylog2-server/src/main/java/org/graylog/failure/FailureSubmissionService.java Outdated Show resolved Hide resolved

graylog2-server/src/main/java/org/graylog/failure/FailureSubmissionService.java Outdated Show resolved Hide resolved

change metric prefix

1d091c7

patrickmann requested a review from AntonEbel November 15, 2024 07:55

AntonEbel approved these changes Nov 15, 2024

View reviewed changes

AntonEbel merged commit ca8452a into master Nov 15, 2024
6 checks passed

AntonEbel deleted the failureMetrics branch November 15, 2024 09:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure metrics for input diagnostics #20933

Failure metrics for input diagnostics #20933

patrickmann commented Nov 11, 2024 •

edited

Loading

AntonEbel left a comment •

edited

Loading

patrickmann commented Nov 13, 2024

AntonEbel left a comment

patrickmann commented Nov 13, 2024

AntonEbel left a comment

Failure metrics for input diagnostics #20933

Failure metrics for input diagnostics #20933

Conversation

patrickmann commented Nov 11, 2024 • edited Loading

AntonEbel left a comment • edited Loading

Choose a reason for hiding this comment

patrickmann commented Nov 13, 2024

AntonEbel left a comment

Choose a reason for hiding this comment

patrickmann commented Nov 13, 2024

AntonEbel left a comment

Choose a reason for hiding this comment

patrickmann commented Nov 11, 2024 •

edited

Loading

AntonEbel left a comment •

edited

Loading